Using compatible shape descriptor for lexicon reduction of printed Farsi subwords

نویسندگان

  • Homa Davoudi
  • Ehsanollah Kabir
چکیده

This Paper presents a method for lexicon reduction of Printed Farsi subwords based on their holistic shape features. Because of the large number of Persian subwords variously shaped from a simple letter to a complex combination of several connected characters, it is not easy to find a fixed shape descriptor suitable for all subwords. In this paper, we propose to select the descriptor according to the input shape characteristics. To do this, a neural network is trained to predict the appropriate descriptor of the input image. This network is implemented in the proposed lexicon reduction system to decide on the descriptor used for comparison of the query image with the lexicon entries. Evaluating the proposed method on a dataset of Persian subwords allows one to attest the effectiveness of the proposed idea of dealing differently with various query shapes. Keywords— Lexicon reduction, Shape description, Compatible descriptor, Persian, Farsi

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Search Space Reduction for Farsi Printed Subwords Recognition by Position of the Points and Signs

In the field of the words recognition, three approaches of words isolation, the overall shape and combination of them are used. Most optical recognition methods recognize the word based on break the word into its letters and then recogniz them. This approach is faced some problems because of the letters isolation dificulties and its recognition accurcy in texts with a low image quality. Therefo...

متن کامل

Detection and compensation of undesirable discontinuities within the farsi/arabic subwords

In this paper, an unexplored subject in the domains of Farsi/Arabic handwritten word preprocessing is introduced. Subwords play a vital role in many applications such as cheque amount recognition, text recognition, lexicon reduction and subword-based word recognition. Correcting the faults occurred in subwords will improve the overall performance of these applications. A subword is a connected-...

متن کامل

Arabic word descriptor for handwritten word indexing and lexicon reduction

Word recognition systems use a lexicon to guide the recognition process in order to improve the recognition rate. However, as the lexicon grows, the computation time increases. In this paper, we present the Arabic word descriptor (AWD) for Arabic word shape indexing and lexicon reduction in handwritten documents. It is formed in two stages. First, the structural descriptor (SD) is computed for ...

متن کامل

یک روش دو مرحلهای برای بازشناسی کلمات دستنوشته فارسی به کمک بلوکبندی تطبیقی گرادیان تصویر

This paper presented a two step method for offline handwritten Farsi word recognition. In first step, in order to improve the recognition accuracy and speed, an algorithm proposed for initial eliminating lexicon entries unlikely to match the input image. For lexicon reduction, the words of lexicon are clustered using ISOCLUS and Hierarchal clustering algorithm. Clustering is based on the featur...

متن کامل

A Study on Farsi Handwriting Styles for Online Recognition

Knowing varieties of writing a letter in a word or a subword in different handwriting styles is very beneficial in recognition specifically for online recognition. In this paper, TMU-OFS dataset consisting of 1000 frequent Farsi subwords is employed to study Farsi handwriting styles. The subwords are grouped based on their delayed strokes and their main bodies, separately. The handwriting style...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1601.06251  شماره 

صفحات  -

تاریخ انتشار 2016